Singular Value Perturbation and Deep Network Optimization
نویسندگان
چکیده
Abstract We develop new theoretical results on matrix perturbation to shed light the impact of architecture performance a deep network. In particular, we explain analytically what learning practitioners have long observed empirically: parameters some architectures (e.g., residual networks, ResNets, and Dense DenseNets) are easier optimize than others convolutional ConvNets). Building our earlier work connecting networks with continuous piecewise-affine splines, an exact local linear representation network layer for family modern that includes ConvNets at one end spectrum DenseNets, other skip connections other. For regression classification tasks squared-error loss, show optimization loss surface is piecewise quadratic in parameters, shape governed by singular values function representation. how matrices this sort behave as add fraction identity multiply certain diagonal matrices. A direct application explains why (such ResNet or DenseNet) ConvNet: thanks its more stable smaller condition number, such less erratic, eccentric, features minima accommodating gradient-based optimization. Our also different nonlinear activation functions network’s values, regardless architecture.
منابع مشابه
Relative Perturbation Theory: I. Eigenvalue and Singular Value Variations∗
The classical perturbation theory for Hermitian matrix eigenvalue and singular value problems provides bounds on the absolute differences between approximate eigenvalues (singular values) and the true eigenvalues (singular values) of a matrix. These bounds may be bad news for small eigenvalues (singular values), which thereby suffer worse relative uncertainty than large ones. However, there are...
متن کاملUpdating Singular Value Decomposition for Rank One Matrix Perturbation
An efficient Singular Value Decomposition (SVD) algorithm is an important tool for distributed and streaming computation in big data problems. It is observed that update of singular vectors of a rank-1 perturbed matrix is similar to a Cauchy matrix-vector product. With this observation, in this paper, we present an efficient method for updating Singular Value Decomposition of rank1 perturbed ma...
متن کاملRestructuring of deep neural network acoustic models with singular value decomposition
Recently proposed deep neural network (DNN) obtains significant accuracy improvements in many large vocabulary continuous speech recognition (LVCSR) tasks. However, DNN requires much more parameters than traditional systems, which brings huge cost during online evaluation, and also limits the application of DNN in a lot of scenarios. In this paper we present our new effort on DNN aiming at redu...
متن کاملSingular perturbation theory
When we apply the steady-state approximation (SSA) in chemical kinetics, we typically argue that some of the intermediates are highly reactive, so that they are removed as fast as they are made. We then set the corresponding rates of change to zero. What we are saying is not that these rates are identically zero, of course, but that they are much smaller than the other rates of reaction. The st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Constructive Approximation
سال: 2022
ISSN: ['0176-4276', '1432-0940']
DOI: https://doi.org/10.1007/s00365-022-09601-5